lemma 24
The Plug-in Approach for Average-Reward and Discounted MDPs: Optimal Sample Complexity Analysis
We study the sample complexity of the plug-in approach for learning $\varepsilon$-optimal policies in average-reward Markov decision processes (MDPs) with a generative model. The plug-in approach constructs a model estimate then computes an average-reward optimal policy in the estimated model. Despite representing arguably the simplest algorithm for this problem, the plug-in approach has never been theoretically analyzed. Unlike the more well-studied discounted MDP reduction method, the plug-in approach requires no prior problem information or parameter tuning. Our results fill this gap and address the limitations of prior approaches, as we show that the plug-in approach is optimal in several well-studied settings without using prior knowledge. Specifically it achieves the optimal diameter- and mixing-based sample complexities of $\widetilde{O}\left(SA \frac{D}{\varepsilon^2}\right)$ and $\widetilde{O}\left(SA \frac{\tau_{\mathrm{unif}}}{\varepsilon^2}\right)$, respectively, without knowledge of the diameter $D$ or uniform mixing time $\tau_{\mathrm{unif}}$. We also obtain span-based bounds for the plug-in approach, and complement them with algorithm-specific lower bounds suggesting that they are unimprovable. Our results require novel techniques for analyzing long-horizon problems which may be broadly useful and which also improve results for the discounted plug-in approach, removing effective-horizon-related sample size restrictions and obtaining the first optimal complexity bounds for the full range of sample sizes without reward perturbation.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)
Nonlinear spiked covariance matrices and signal propagation in deep neural networks
Wang, Zhichao, Wu, Denny, Fan, Zhou
Many recent works have studied the eigenvalue spectrum of the Conjugate Kernel (CK) defined by the nonlinear feature map of a feedforward neural network. However, existing results only establish weak convergence of the empirical eigenvalue distribution, and fall short of providing precise quantitative characterizations of the ''spike'' eigenvalues and eigenvectors that often capture the low-dimensional signal structure of the learning problem. In this work, we characterize these signal eigenvalues and eigenvectors for a nonlinear version of the spiked covariance model, including the CK as a special case. Using this general result, we give a quantitative description of how spiked eigenstructure in the input data propagates through the hidden layers of a neural network with random weights. As a second application, we study a simple regime of representation learning where the weight matrix develops a rank-one signal component over training and characterize the alignment of the target function with the spike eigenvector of the CK on test data.
- Africa > Middle East > Tunisia > Ben Arous Governorate > Ben Arous (0.04)
- North America > United States > New York (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- (3 more...)
Expressive Power of ReLU and Step Networks under Floating-Point Operations
Park, Yeachan, Hwang, Geonho, Lee, Wonyeol, Park, Sejun
The study of the expressive power of neural networks has investigated the fundamental limits of neural networks. Most existing results assume real-valued inputs and parameters as well as exact operations during the evaluation of neural networks. However, neural networks are typically executed on computers that can only represent a tiny subset of the reals and apply inexact operations. In this work, we analyze the expressive power of neural networks under a more realistic setup: when we use floating-point numbers and operations. Our first set of results assumes floating-point operations where the significand of a float is represented by finite bits but its exponent can take any integer value. Under this setup, we show that neural networks using a binary threshold unit or ReLU can memorize any finite input/output pairs and can approximate any continuous function within a small error. We also show similar results on memorization and universal approximation when floating-point operations use finite bits for both significand and exponent; these results are applicable to many popular floating-point formats such as those defined in the IEEE 754 standard (e.g., 32-bit single-precision format) and bfloat16.
A Finer Calibration Analysis for Adversarial Robustness
Awasthi, Pranjal, Mao, Anqi, Mohri, Mehryar, Zhong, Yutao
W e present a more general analysis of H -calibration for adversarially robust classification. By adopting a finer definition of calibration, we can cover setti ngs beyond the restricted hypothesis sets studied in previous work. In particular, our results ho ld for most common hypothesis sets used in machine learning. W e both fix some previous calibration re sults ( Bao et al., 2020) and generalize others ( A wasthi et al., 2021). Moreover, our calibration results, combined with the pre vious study of consistency by A wasthi et al. ( 2021), also lead to more general H -consistency results covering common hypothesis sets.